Spatial Disparities Studio

Introduction

Studio

  • 2023-11-27 Overview of Spatial Disparities Research Design
  • 2023-11-29 Geoprocessing for Disparities Research
  • 2023-12-04 Areal Interpolation for Disparities Research
  • 2023-12-06 Integration of Spatial Analysis and Statistical Inference

Research Question

In San Diego County, are there ethnic, racial or socieconomic disparities in exposure to freeway noise and pollution?

Research Design

Three Components

  1. Where are the areas of high (low) exposure to freeway pollution?
  2. Who lives in those areas?
  3. Statistical Tests for difference in populations between these areas.

Research Design

Today

  1. Who lives in those areas?

Today

We will focus on areal interpolation and additional geoprocessing to generate some of the spatially defined variables we need for our disparities analysis.

import geopandas as gpd
/tmp/ipykernel_3247372/3841419929.py:1: DeprecationWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas still uses PyGEOS by default. However, starting with version 0.14, the default will switch to Shapely. To force to use Shapely 2.0 now, you can either uninstall PyGEOS or set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In the next release, GeoPandas will switch to using Shapely by default, even if PyGEOS is installed. If you only have PyGEOS installed to get speed-ups, this switch should be smooth. However, if you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd
buffer = gpd.read_file('buffer.geojson', driver='GeoJSON')
ERROR 1: PROJ: proj_create_from_database: Open of /opt/tljh/user/share/proj failed
buffer.plot()
<Axes: >

tracts = gpd.read_file("sdtracts.geojson", driver='GeoJSON')
tracts.plot()
<Axes: >

m = tracts.explore(tooltip=['per_capita_income'])
buffer.explore(m=m, color='k')
Make this Notebook Trusted to load map: File -> Trust Notebook

Which tracts intersect the buffer?

buffer.shape
(1, 1)
tracts.shape
(628, 159)
buffer.sindex.query(tracts.geometry, predicate='intersects')
array([[  0,   1,   2,   3,   4,   5,   6,   7,  10,  11,  12,  13,  15,
         16,  17,  18,  19,  20,  22,  23,  24,  27,  28,  29,  30,  34,
         40,  41,  42,  46,  47,  48,  49,  50,  55,  56,  64,  65,  66,
         67,  68,  70,  71,  72,  73,  74,  75,  76,  77,  78,  79,  80,
         81,  82,  83,  84,  85,  86,  87,  88,  89,  90,  93,  94,  95,
         96,  97,  98,  99, 100, 101, 103, 104, 105, 106, 107, 108, 109,
        110, 111, 112, 114, 115, 123, 125, 128, 130, 133, 135, 136, 137,
        138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150,
        151, 152, 153, 154, 155, 156, 158, 159, 160, 161, 162, 163, 164,
        165, 169, 173, 178, 179, 180, 181, 183, 185, 186, 187, 188, 189,
        190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202,
        203, 205, 207, 208, 209, 210, 211, 212, 214, 216, 217, 218, 219,
        220, 223, 224, 225, 227, 228, 231, 232, 234, 235, 238, 239, 240,
        241, 242, 243, 246, 247, 248, 249, 250, 252, 253, 254, 255, 256,
        257, 258, 259, 260, 262, 263, 264, 265, 266, 267, 268, 269, 271,
        272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284,
        285, 286, 287, 288, 294, 298, 300, 301, 302, 303, 305, 306, 307,
        308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320,
        321, 322, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334,
        335, 336, 337, 339, 341, 342, 343, 344, 348, 349, 350, 351, 352,
        354, 356, 357, 358, 359, 360, 361, 362, 364, 366, 367, 369, 370,
        371, 373, 375, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387,
        389, 390, 391, 392, 393, 399, 401, 402, 403, 404, 405, 406, 408,
        409, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422,
        423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435,
        436, 437, 442, 443, 444, 445, 446, 447, 448, 450, 451, 452, 454,
        456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468,
        469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481,
        482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 494, 495,
        496, 497, 498, 503, 505, 506, 507, 508, 509, 510, 511, 512, 513,
        514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526,
        528, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541,
        542, 543, 546, 547, 548, 550, 551, 552, 554, 555, 556, 557, 558,
        559, 560, 562, 563, 565, 566, 567, 568, 569, 571, 572, 573, 574,
        575, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588,
        589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601,
        602, 603, 604, 605, 606, 608, 609, 610, 611, 612, 613, 614, 615,
        616, 617, 618, 619, 621, 622, 623, 624, 625, 626, 627],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0]])
intersecting_idxs = buffer.sindex.query(tracts.geometry, predicate='intersects')[0]
len(intersecting_idxs)
492
tracts.shape
(628, 159)
import numpy as np
intersecting = np.array([False] * tracts.shape[0])
intersecting[:10]
array([False, False, False, False, False, False, False, False, False,
       False])
intersecting[intersecting_idxs]=True
intersecting[:10]
array([ True,  True,  True,  True,  True,  True,  True,  True, False,
       False])
tracts['intersecting'] = intersecting
base = tracts.plot(column='intersecting', categorical=True, legend=True, figsize=(16,9))
buffer.plot(ax=base, color='k', alpha=0.5)
<Axes: >

How many tracts intersect the buffer?

tracts.groupby(by='intersecting').count()
geoid n_mexican_pop n_cuban_pop n_puerto_rican_pop n_russian_pop n_italian_pop n_german_pop n_irish_pop n_scandaniavian_pop n_foreign_born_pop ... p_poverty_rate_over_65 p_poverty_rate_children p_poverty_rate_white p_poverty_rate_black p_poverty_rate_hispanic p_poverty_rate_native p_poverty_rate_asian year _dvar geometry
intersecting
False 136 136 136 136 136 136 136 136 136 136 ... 136 136 136 136 136 136 136 136 136 136
True 492 492 492 492 492 492 492 492 492 492 ... 486 486 486 486 486 486 486 492 492 492

2 rows × 159 columns

How many people live in tracts that intersect the buffer?

tracts[['intersecting', 'n_total_pop']].groupby(by='intersecting').sum()
n_total_pop
intersecting
False 662314.0
True 2653759.0

Which tracts have their centroids in the buffer?

centroids = tracts.centroid
tracts['centroids'] = centroids
tracts = tracts.set_geometry('centroids')
tracts.plot()
<Axes: >

buffer.sindex.query(tracts.geometry, predicate='intersects')
array([[ 16,  22,  30,  82,  87,  89,  94,  95,  97, 103, 105, 109, 112,
        114, 133, 136, 137, 138, 139, 181, 183, 195, 201, 212, 241, 256,
        257, 263, 267, 268, 279, 284, 285, 294, 349, 357, 380, 384, 385,
        402, 419, 450, 452, 457, 465, 468, 470, 471, 477, 479, 480, 483,
        498, 535, 556, 565, 573, 575, 577, 578, 579, 580, 581, 601, 603,
        621, 625],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]])
intersecting_idxs = buffer.sindex.query(tracts.geometry, predicate='intersects')[0]
len(intersecting_idxs)
67
intersecting = np.array([False] * tracts.shape[0])
m = buffer.explore(color='grey')
tracts.explore(m=m)
tracts.iloc[intersecting_idxs].explore(color='red', m=m)
Make this Notebook Trusted to load map: File -> Trust Notebook
intersecting[intersecting_idxs]=True
intersecting[:10]
array([False, False, False, False, False, False, False, False, False,
       False])
tracts['cent_in_buffer'] = intersecting
tracts = tracts.set_geometry('geometry')
base = tracts.plot(column='cent_in_buffer', categorical=True, legend=True, figsize=(16,9))
buffer.plot(ax=base, color='k', alpha=0.5)
<Axes: >

How many tracts have their centroid within the buffer?

tracts.groupby(by='cent_in_buffer').count()
geoid n_mexican_pop n_cuban_pop n_puerto_rican_pop n_russian_pop n_italian_pop n_german_pop n_irish_pop n_scandaniavian_pop n_foreign_born_pop ... p_poverty_rate_white p_poverty_rate_black p_poverty_rate_hispanic p_poverty_rate_native p_poverty_rate_asian year _dvar geometry intersecting centroids
cent_in_buffer
False 561 561 561 561 561 561 561 561 561 561 ... 556 556 556 556 556 561 561 561 561 561
True 67 67 67 67 67 67 67 67 67 67 ... 66 66 66 66 66 67 67 67 67 67

2 rows × 161 columns

How many people live in tracts that have their centroid within the buffer?

tracts[['cent_in_buffer', 'n_total_pop']].groupby(by='cent_in_buffer').sum()
n_total_pop
cent_in_buffer
False 3012549.0
True 303524.0

Areal Interpolation

tracts.n_total_pop.sum()
3316073.0
from tobler.area_weighted import area_interpolate

Interpolation of an Extensive Variable

Here we want to interpolate an extensive variable (a count) from a set of source polygons (indexed with \(i\)) into target polygons, indexed with \(j\).

Define \(v_j\) as the value for the variable that we wish to interpolate to polygon \(j\).

We estimate the value \(v_j\) by allocating a share of the count variable measured at each of the source polygons intersecting the target polygon: \[v_j = \sum_i v_i w_{i,j}\] where the weights are defined as:

\[w_{i,j} = a_{i,j} / \sum_k a_{i,k}\]

The weight is simply the share of polygon \(i\)’s area that interstects with target polygon \(j\).

in_buffer = area_interpolate(tracts, buffer, extensive_variables=['n_total_pop'])
in_buffer.head()
n_total_pop geometry
0 2.653759e+06 MULTIPOLYGON (((475814.243 3628586.622, 475809...

This is the total population of the tracts that intersect the buffer.

Note that the default is to assume the total value for each source region is allocated proportionately across the target zones that it intersects. This may, or may not, be what you want.

In this case it is not.

area_interpolate?
Signature:
area_interpolate(
    source_df,
    target_df,
    extensive_variables=None,
    intensive_variables=None,
    table=None,
    allocate_total=True,
    spatial_index='auto',
    n_jobs=1,
    categorical_variables=None,
)
Docstring:
Area interpolation for extensive, intensive and categorical variables.
Parameters
----------
source_df : geopandas.GeoDataFrame
target_df : geopandas.GeoDataFrame
extensive_variables : list
    [Optional. Default=None] Columns in dataframes for extensive variables
intensive_variables : list
    [Optional. Default=None] Columns in dataframes for intensive variables
table : scipy.sparse.csr_matrix
    [Optional. Default=None] Area allocation source-target correspondence
    table. If not provided, it will be built from `source_df` and
    `target_df` using `tobler.area_interpolate._area_tables_binning`
allocate_total : boolean
    [Optional. Default=True] True if total value of source area should be
    allocated. False if denominator is area of i. Note that the two cases
    would be identical when the area of the source polygon is exhausted by
    intersections. See Notes for more details.
spatial_index : str
    [Optional. Default="auto"] Spatial index to use to build the
    allocation of area from source to target tables. It currently support
    the following values:
        - "source": build the spatial index on `source_df`
        - "target": build the spatial index on `target_df`
        - "auto": attempts to guess the most efficient alternative.
          Currently, this option uses the largest table to build the
          index, and performs a `bulk_query` on the shorter table.
    This argument is ignored if n_jobs>1 (or n_jobs=-1).
n_jobs : int
    [Optional. Default=1] Number of processes to run in parallel to
    generate the area allocation. If -1, this is set to the number of CPUs
    available. If `table` is passed, this is ignored.
categorical_variables : list
    [Optional. Default=None] Columns in dataframes for categorical variables
Returns
-------
estimates : geopandas.GeoDataFrame
     new geodaraframe with interpolated variables as columns and target_df geometry
     as output geometry
Notes
-----
The assumption is both dataframes have the same coordinate reference system.
For an extensive variable, the estimate at target polygon j (default case) is:
.. math::
 v_j = \sum_i v_i w_{i,j}
 w_{i,j} = a_{i,j} / \sum_k a_{i,k}
If the area of the source polygon is not exhausted by intersections with
target polygons and there is reason to not allocate the complete value of
an extensive attribute, then setting allocate_total=False will use the
following weights:
.. math::
 v_j = \sum_i v_i w_{i,j}
 w_{i,j} = a_{i,j} / a_i
where a_i is the total area of source polygon i.
For an intensive variable, the estimate at target polygon j is:
.. math::
 v_j = \sum_i v_i w_{i,j}
 w_{i,j} = a_{i,j} / \sum_k a_{k,j}
For categorical variables, the estimate returns ratio of presence of each
unique category.
File:      /opt/tljh/user/lib/python3.9/site-packages/tobler/area_weighted/area_interpolate.py
Type:      function
in_buffer1 = area_interpolate(tracts, buffer, extensive_variables=['n_total_pop'],
                             allocate_total=False)
in_buffer1.head()
n_total_pop geometry
0 756217.589538 MULTIPOLYGON (((475814.243 3628586.622, 475809...

This is an estimate of only the population that is inside the buffer for tracts that intersect the buffer.

Geoprocessing

make two areas in the gdf

tracts.head()
geoid n_mexican_pop n_cuban_pop n_puerto_rican_pop n_russian_pop n_italian_pop n_german_pop n_irish_pop n_scandaniavian_pop n_foreign_born_pop ... p_poverty_rate_black p_poverty_rate_hispanic p_poverty_rate_native p_poverty_rate_asian year _dvar geometry intersecting centroids cent_in_buffer
0 06073000100 278.0 19.0 9.0 42.0 125.0 86.0 65.0 11.0 389.0 ... 0.00000 1.260912 0.000000 0.0 2019 1 MULTIPOLYGON (((481742.259 3623897.455, 481760... True POINT (482586.188 3623920.893) False
1 06073000201 95.0 11.0 0.0 23.0 44.0 32.0 64.0 0.0 163.0 ... 0.00000 1.322052 0.581703 0.0 2019 1 MULTIPOLYGON (((483245.448 3624434.672, 483329... True POINT (483758.990 3624029.501) False
2 06073000202 372.0 36.0 36.0 63.0 148.0 160.0 37.0 0.0 832.0 ... 0.00000 0.848025 0.000000 0.0 2019 1 MULTIPOLYGON (((482758.309 3623111.883, 482778... True POINT (483591.005 3623075.519) False
3 06073000300 757.0 0.0 43.0 7.0 87.0 209.0 192.0 0.0 499.0 ... 0.90945 2.412021 0.000000 0.0 2019 1 MULTIPOLYGON (((484201.416 3623470.650, 484223... True POINT (484794.769 3623005.762) False
4 06073000400 701.0 0.0 11.0 29.0 66.0 61.0 16.0 0.0 615.0 ... 1.37486 1.206510 0.000000 0.0 2019 1 MULTIPOLYGON (((483994.991 3624544.291, 484058... True POINT (484724.410 3623954.290) False

5 rows × 162 columns

county = tracts.dissolve(by='_dvar')
county.plot()
<Axes: >

type(county)
geopandas.geodataframe.GeoDataFrame
county.shape
(1, 161)
type(buffer)
geopandas.geodataframe.GeoDataFrame
buffer.shape
(1, 1)
buffer.reset_index(inplace=True)
county.reset_index(inplace=True)
outside = county.difference(buffer.geometry)
outside.plot()
<Axes: >

type(outside)
geopandas.geoseries.GeoSeries
joined = buffer.geometry._append(outside)
joined.plot()
<Axes: >

joined = buffer.geometry._append(outside)
joined = gpd.GeoDataFrame(geometry=joined)
joined.plot()
<Axes: >

joined['buffer'] = [True, False]
joined.head()
geometry buffer
0 MULTIPOLYGON (((475814.243 3628586.622, 475809... True
0 MULTIPOLYGON (((480756.394 3599073.688, 480752... False
joined.plot(column='buffer', categorical=True, legend=True)
<Axes: >

estimate = area_interpolate(tracts, joined, extensive_variables=['n_total_pop'])
estimate.head()
n_total_pop geometry
0 7.562176e+05 MULTIPOLYGON (((475814.243 3628586.622, 475809...
1 2.559855e+06 MULTIPOLYGON (((480756.394 3599073.688, 480752...
estimate['buffer'] = joined['buffer'].values
estimate.head()
n_total_pop geometry buffer
0 7.562176e+05 MULTIPOLYGON (((475814.243 3628586.622, 475809... True
1 2.559855e+06 MULTIPOLYGON (((480756.394 3599073.688, 480752... False

Interpolation of an Intensive Variable

Here we want to interpolate an intensive variable (a rate or proportion) from a set of source polygons (indexed with \(i\)) into target polygons, indexed with \(j\).

Define \(v_j\) as the value for the variable that we wish to interpolate to polygon \(j\).

We estimate the value \(v_j\) using a weighted average of the observed values in the sources polygons intersecting the target polygon. \[v_j = \sum_i v_i w_{i,j}\] where the weights are defined as:

\[w_{i,j} = a_{i,j} / \sum_k a_{k,j}\]

The weight is simply the share of polygon \(j\)’s area in the set of intersections between target polygon \(j\) and all contributing source polygons.

There are two general cases with interpolation of intensive variables:

  • one only has the intensive variables (e.g., median tract household income)
  • one has two extensive variables that can first be interpolated to derive an estimate of the (population and total income)
estimate = area_interpolate(tracts, joined, extensive_variables=['n_total_pop',
                                                                'n_nonhisp_white_persons'])
estimate.head()
n_total_pop n_nonhisp_white_persons geometry
0 7.562176e+05 3.215112e+05 MULTIPOLYGON (((475814.243 3628586.622, 475809...
1 2.559855e+06 1.189245e+06 MULTIPOLYGON (((480756.394 3599073.688, 480752...
estimate['p_white'] = estimate.n_nonhisp_white_persons / estimate.n_total_pop
estimate['buffer'] = joined['buffer'].values
estimate.head()
n_total_pop n_nonhisp_white_persons geometry p_white buffer
0 7.562176e+05 3.215112e+05 MULTIPOLYGON (((475814.243 3628586.622, 475809... 0.425157 True
1 2.559855e+06 1.189245e+06 MULTIPOLYGON (((480756.394 3599073.688, 480752... 0.464575 False
tracts.n_nonhisp_white_persons.sum() / tracts.n_total_pop.sum()
0.45558586918924887
estimate = area_interpolate(tracts, joined, extensive_variables=['n_total_pop',
                                                                'n_nonhisp_white_persons',
                                                                'n_nonhisp_black_persons',
                                                                'n_hispanic_persons'])
estimate['p_white'] = estimate.n_nonhisp_white_persons / estimate.n_total_pop
estimate['p_black'] = estimate.n_nonhisp_black_persons / estimate.n_total_pop
estimate['p_hispanic'] = estimate.n_hispanic_persons / estimate.n_total_pop
estimate['buffer'] = joined['buffer'].values
estimate.head()
n_total_pop n_nonhisp_white_persons n_nonhisp_black_persons n_hispanic_persons geometry p_white p_black p_hispanic buffer
0 7.562176e+05 3.215112e+05 35412.835692 292432.788563 MULTIPOLYGON (((475814.243 3628586.622, 475809... 0.425157 0.046829 0.386705 True
1 2.559855e+06 1.189245e+06 120671.164353 825084.214846 MULTIPOLYGON (((480756.394 3599073.688, 480752... 0.464575 0.047140 0.322317 False
estimate = area_interpolate(tracts, joined, extensive_variables=['n_total_pop',
                                                                'n_nonhisp_white_persons',
                                                                'n_nonhisp_black_persons',
                                                                'n_hispanic_persons'],
                            intensive_variables=['median_household_income'])
/opt/tljh/user/lib/python3.9/site-packages/tobler/util/util.py:53: UserWarning: nan values in variable: median_household_income, replacing with 0
  warn(f"nan values in variable: {column}, replacing with 0")
estimate.head()
n_total_pop n_nonhisp_white_persons n_nonhisp_black_persons n_hispanic_persons median_household_income geometry
0 7.562176e+05 3.215112e+05 35412.835692 292432.788563 77579.151075 MULTIPOLYGON (((475814.243 3628586.622, 475809...
1 2.559855e+06 1.189245e+06 120671.164353 825084.214846 66429.266067 MULTIPOLYGON (((480756.394 3599073.688, 480752...
tracts.per_capita_income
0      97661.0
1      70625.0
2      53264.0
3      56624.0
4      54117.0
        ...   
623    80044.0
624    31275.0
625    16034.0
626    60668.0
627        NaN
Name: per_capita_income, Length: 628, dtype: float64
tracts['income'] = tracts.per_capita_income * tracts.n_total_pop # estimate total tract income before deriving a second estimate of the intensive variable per capita incom
estimate = area_interpolate(tracts, joined, extensive_variables=['n_total_pop',
                                                                'n_nonhisp_white_persons',
                                                                'n_nonhisp_black_persons',
                                                                'n_hispanic_persons',
                                                                'income'],
                            intensive_variables=['median_household_income',
                                                'per_capita_income'])
/opt/tljh/user/lib/python3.9/site-packages/tobler/util/util.py:53: UserWarning: nan values in variable: income, replacing with 0
  warn(f"nan values in variable: {column}, replacing with 0")
/opt/tljh/user/lib/python3.9/site-packages/tobler/util/util.py:53: UserWarning: nan values in variable: median_household_income, replacing with 0
  warn(f"nan values in variable: {column}, replacing with 0")
/opt/tljh/user/lib/python3.9/site-packages/tobler/util/util.py:53: UserWarning: nan values in variable: per_capita_income, replacing with 0
  warn(f"nan values in variable: {column}, replacing with 0")
estimate.head()
n_total_pop n_nonhisp_white_persons n_nonhisp_black_persons n_hispanic_persons income median_household_income per_capita_income geometry
0 7.562176e+05 3.215112e+05 35412.835692 292432.788563 2.759467e+10 77579.151075 37818.115972 MULTIPOLYGON (((475814.243 3628586.622, 475809...
1 2.559855e+06 1.189245e+06 120671.164353 825084.214846 9.865887e+10 66429.266067 32423.814728 MULTIPOLYGON (((480756.394 3599073.688, 480752...
estimate['pci'] = estimate.income / estimate.n_total_pop
estimate.head()
n_total_pop n_nonhisp_white_persons n_nonhisp_black_persons n_hispanic_persons income median_household_income per_capita_income geometry pci
0 7.562176e+05 3.215112e+05 35412.835692 292432.788563 2.759467e+10 77579.151075 37818.115972 MULTIPOLYGON (((475814.243 3628586.622, 475809... 36490.385567
1 2.559855e+06 1.189245e+06 120671.164353 825084.214846 9.865887e+10 66429.266067 32423.814728 MULTIPOLYGON (((480756.394 3599073.688, 480752... 38540.797907

Note that here, are second estimate of per captia income, based on the ratio of two extensive variables (income and n_total_pop) gives us a lower estimate in the buffer region than outside the buffer.

Because we had the two extensive variables at hand, we were able to develop this second estimate of the intensive variable pci. We should prefer this estimate to the direct intensive estimate, since for the direct estimate it is the relative area of the intersection that is used to construct the weights, rather than the relative population size.

In the case of median_household_income, however, we are unable to construct a second estimate of this intensive variable, so we have to rely on the direct (rather than derived) estimate.

Save our work

Save our work for the final statistical analysis:

estimate.to_file('estimate.geojson', driver='GeoJSON')